Comparing machine learning approaches to predict SEEG accuracy

Stereoencephalography (SEEG) is a technique used in drug-resistant epilepsy patients that may be a candidate for surgical resection of the epileptogenic zone. Multiple electrodes are placed using a so-called "frame based" stereotactic approach, in our case using the Leksell frame. In our previous paper "Methodology, outcome, safety and in vivo accuracy in traditional frame-based stereoelectroencephalography" by Van der Loo et al (2017) we reported on SEEG electrode implantation accuracy in a cohort of 71 patients who were operated between September 2008 and April 2016, in whom a total of 902 electrodes were implanted. Data for in vivo application accuracy analysis were available for 866 electrodes.

The goal of the current project is to use a public version of this dataset (without any personal identifiers) to predict electrode implantation accuracy by using and comparing different machine learning approaches.

Pieter Kubben, MD, PhD
neurosurgeon @ Maastricht University Medical Center, The Netherlands

For any questions you can reach me by email or on Twitter.

Data description

The public dataset contains these variables:

PatientPosition: patient position during surgery (nominal: supine, prone)
Contacts: nr of contacts of electrode implanted (ordinal: 5, 8, 10, 12, 15, 18)
ElectrodeType: describes trajectory type (nominal: oblique, orthogonal). Oblique refers to implantation using the Leksell arc, and orthogonal using a dedicated L-piece mounted on the frame (mostly implants in temporal lobe) when arc angles become too high (approx > 155°) or too low (approx < 25°)
PlanningX: planned Cartesian X coord of target (numeric, in mm)
PlanningY: planned Cartesian Y coord of target (numeric, in mm)
PlanningZ: planned Cartesian Z coord of target (numeric, in mm)
PlanningRing: planned ring coord, the trajectory direction in sagittal plane (numeric, in degrees); defines entry
PlanningArc: planned arc coord (trajectory direction in coronal plane (numeric, in degrees); defines entry
DuraTipDistancePlanned: distance from dura mater (outer sheet covering the brain surface) to target (numeric, in mm)
EntryX: real Cartesian X coord of entry point (numeric, in mm)
EntryY: real Cartesian Y coord of entry point (numeric, in mm)
EntryZ: real Cartesian Z coord of entry point (numeric, in mm)
TipX: real Cartesian X coord of target point (numeric, in mm)
TipY: real Cartesian Y coord of target point (numeric, in mm)
TipZ: real Cartesian Z coord of target point (numeric, in mm)
SkinSkullDistance: distance between skin surfacce and skull surface (numeric, in mm)
SkullThickness: skull thickness (numeric, in mm)
SkullAngle: insertion angle of electrode relative to skull (numeric, in degrees)
ScrewLength: length of bone screw used to guide and fixate electrode (ordinal: 20, 25, 30, 35 mm)

The electrodes are the Microdeep depth electrodes by DIXI Medical.

To the limited extent possible in this case I tried to make these FAIR data and adhere to FAIR guiding principles. In practice this meant I introduced the topic, described my data and created a DOI.

Now let's get started.



In [90]:

    
# import libraries
import turicreate as tc
import h5py
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
plt.style.use('ggplot')
%matplotlib inline
import warnings; warnings.simplefilter('ignore')
#%xmode plain; # shorter error messages
pd.options.mode.chained_assignment = None 
# global setting whether to save figures or not
# will save as 300 dpi PNG - all filenames start with "fig_"
save_figures = False



In [2]:

    
# load data
electrodes = pd.read_csv('../data/electrodes_public.csv')



In [3]:

    
# find missing values
nan_rows = sum([True for idx,row in electrodes.iterrows() if any(row.isnull())])
print('Nr of rows with missing values:', nan_rows)









    



Nr of rows with missing values: 823



In [4]:

    
def missing_values_table(df): 
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum()/len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table = mis_val_table.rename(columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        return mis_val_table



In [5]:

    
missing_values_table(electrodes)









    Out[5]:







  
    
      
      Missing Values
      % of Total Values
    
  
  
    
      PatientPosition
      0
      0.000000
    
    
      Contacts
      56
      6.466513
    
    
      ElectrodeType
      0
      0.000000
    
    
      PlanningX
      0
      0.000000
    
    
      PlanningY
      0
      0.000000
    
    
      PlanningZ
      0
      0.000000
    
    
      PlanningRing
      271
      31.293303
    
    
      PlanningArc
      271
      31.293303
    
    
      DuraTipDistancePlanned
      54
      6.235566
    
    
      EntryX
      279
      32.217090
    
    
      EntryY
      119
      13.741339
    
    
      EntryZ
      131
      15.127021
    
    
      TipX
      0
      0.000000
    
    
      TipY
      0
      0.000000
    
    
      TipZ
      0
      0.000000
    
    
      SkinSkullDistance
      0
      0.000000
    
    
      SkullThickness
      0
      0.000000
    
    
      SkullAngle
      0
      0.000000
    
    
      ScrewLength
      540
      62.355658



In [6]:

    
#not used features
electrodes.drop(['EntryX', 'EntryY', 'EntryZ','ScrewLength','PlanningRing','PlanningArc'], axis = 1, inplace = True)



In [7]:

    
# calculate TPLE and remove entry data from dataframe
electrodes['TPLE'] = np.sqrt(np.square(electrodes['TipX'] - electrodes['PlanningX']) + 
                              np.square(electrodes['TipY'] - electrodes['PlanningY']) + 
                              np.square(electrodes['TipZ'] - electrodes['PlanningZ'])
                             ).round(1)



In [8]:

    
#plt.figure(figsize=[16, 8])
#sns.distplot(electrodes['TPLE'], hist=False, rug=True)
#sns.distplot(electrodes['TPLE'])



In [9]:

    
plt.figure(figsize=[16, 8])
sns.kdeplot(electrodes[electrodes['PatientPosition']=='Supine']['TPLE'], label='Supine')
sns.kdeplot(electrodes[electrodes['PatientPosition']=='Prone']['TPLE'], label='Prone',  color='y')









    Out[9]:





<matplotlib.axes._subplots.AxesSubplot at 0x1a20b8c9e8>

It looks like their distributions come from different samples



In [10]:

    
print('TPLE Mean Supine: {} (SD {}) \nTPLE Mean Prone: {} (SD {})'.format(
    round(electrodes[electrodes['PatientPosition']=='Supine']['TPLE'].mean(),2), round(electrodes[electrodes['PatientPosition']=='Supine']['TPLE'].std(),2), 
    round(electrodes[electrodes['PatientPosition']=='Prone']['TPLE'].mean(),2), round(electrodes[electrodes['PatientPosition']=='Prone']['TPLE'].std(),2)))









    



TPLE Mean Supine: 3.45 (SD 2.88) 
TPLE Mean Prone: 3.92 (SD 2.75)

Let's prove if indeed are independet samples with a t-test. If we observe a large p-value, then we cannot reject the null hypothesis of identical means. If the p-value is smaller than alpha, then we reject the null hypothesis of equal means.



In [11]:

    
alpha = 0.05
t_stat, p = stats.mstats.ttest_ind(electrodes[electrodes['PatientPosition']=='Supine']['TPLE'],
                                   electrodes[electrodes['PatientPosition']=='Prone']['TPLE'])
if p < alpha: 
    print("p= {} The null hypothesis can be rejected".format(p))
else:
    print("p= {} The null hypothesis cannot be rejected".format(p))









    



p= 0.04187565196970924 The null hypothesis can be rejected

So it means the two groups came from diferent populations, and we might trate them different

We do the same with electrode type



In [12]:

    
print('TPLE Mean Oblique: {} (SD {}) \nTPLE Mean Orthogonal: {} (SD {})'.format(
    round(electrodes[electrodes['ElectrodeType']=='Oblique']['TPLE'].mean(),2), round(electrodes[electrodes['ElectrodeType']=='Oblique']['TPLE'].std(),2), 
    round(electrodes[electrodes['ElectrodeType']=='Orthogonal']['TPLE'].mean(),2), round(electrodes[electrodes['ElectrodeType']=='Orthogonal']['TPLE'].std(),2)))









    



TPLE Mean Oblique: 3.44 (SD 2.93) 
TPLE Mean Orthogonal: 3.81 (SD 2.68)

Computing t-test.



In [13]:

    
alpha = 0.05
t_stat, p = stats.mstats.ttest_ind(electrodes[electrodes['ElectrodeType']=='Oblique']['TPLE'],
                                   electrodes[electrodes['ElectrodeType']=='Orthogonal']['TPLE'])
if p < alpha: 
    print("p= {} The null hypothesis can be rejected".format(p))
else:
    print("p= {} The null hypothesis cannot be rejected".format(p))









    



p= 0.07447724453227494 The null hypothesis cannot be rejected



In [14]:

    
#missing_values_table(electrodes)



In [15]:

    
#taking into account came from different distributions
df_supine = electrodes[electrodes['PatientPosition']=='Supine']
df_prone = electrodes[electrodes['PatientPosition']=='Prone']

Input missing variables

We start with the variable to input DuraTipDistancePlanned, but first take a look to the distribution



In [16]:

    
dura_sup = df_supine['DuraTipDistancePlanned'].dropna()
dura_pro = df_prone['DuraTipDistancePlanned'].dropna()



In [17]:

    
# normallity test
def normal_test(vector, alpha = 0.05):
    k2, p = stats.mstats.normaltest(vector)
    if p < alpha: # null hypothesis: x comes from a normal distribution
        print("p = {} The null hypothesis can be rejected".format(p))
    else: 
        print("p = {} The null hypothesis cannot be rejected".format(p))



In [18]:

    
#electrodes['DuraTipDistancePlanned'].hist()
plt.figure(figsize=[16, 8])
sns.distplot(dura_sup)
sns.distplot(dura_pro)
normal_test(dura_sup)
normal_test(dura_pro)









    



p = 5.988057054136884e-17 The null hypothesis can be rejected
p = 0.0005791646181393273 The null hypothesis can be rejected

They are normal distribuited so we might input variables with the statistics of the variable



In [19]:

    
def input_mean(df, variable):
    df_null = df[df[variable].isnull()]
    df_not_null = df[df[variable].notnull()]
    df_null[variable] = round(df_not_null[variable].mean(),1)
    inputed_df = df_null.append(df_not_null).sort_index()
    return inputed_df



In [20]:

    
df_supine = input_mean(df_supine, 'DuraTipDistancePlanned')



In [21]:

    
df_prone = input_mean(df_prone, 'DuraTipDistancePlanned')

We merge again



In [22]:

    
electrodes = df_supine.append(df_prone).sort_index()



In [23]:

    
#missing_values_table(electrodes)

Next variable to input is the number of contacts



In [24]:

    
#electrodes.insert(0, 'Index', electrodes.index)



In [25]:

    
contact_null = electrodes[electrodes['Contacts'].isnull()]
contact_not_null = electrodes[electrodes['Contacts'].notnull()]



In [26]:

    
contact_not_null['Contacts'] = contact_not_null['Contacts'].astype(int)

We need to convert into sframe



In [27]:

    
sf_contact_not_null = tc.SFrame(data=contact_not_null)



In [28]:

    
# features for contacts classifier
features = ['PatientPosition', 'ElectrodeType', 'PlanningX', 'PlanningY', 'PlanningZ', 'DuraTipDistancePlanned', 'SkinSkullDistance', 'SkullThickness', 'SkullAngle']



In [29]:

    
# We train a classifier
train_data, test_data = sf_contact_not_null.random_split(0.85)

clf1 = tc.classifier.create(train_data, target='Contacts', features = features)









    



PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
          You can set ``validation_set=None`` to disable validation tracking.

PROGRESS: The following methods are available for this type of problem.
PROGRESS: BoostedTreesClassifier, RandomForestClassifier, DecisionTreeClassifier, LogisticClassifier
PROGRESS: The returned model will be chosen according to validation accuracy.






    




Boosted trees classifier:






    




--------------------------------------------------------






    




Number of examples          : 658






    




Number of classes           : 6






    




Number of feature columns   : 9






    




Number of unpacked features : 9






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




| Iteration | Elapsed Time | Training-accuracy | Validation-accuracy | Training-log_loss | Validation-log_loss |






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




| 1         | 0.025062     | 0.820669          | 0.545455            | 1.309959          | 1.496785            |






    




| 2         | 0.044466     | 0.851064          | 0.545455            | 1.053785          | 1.417049            |






    




| 3         | 0.063745     | 0.873860          | 0.545455            | 0.871295          | 1.364482            |






    




| 4         | 0.090400     | 0.886018          | 0.545455            | 0.736904          | 1.317326            |






    




| 5         | 0.111388     | 0.899696          | 0.545455            | 0.634599          | 1.318138            |






    




| 6         | 0.131019     | 0.914894          | 0.545455            | 0.545460          | 1.321431            |






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




Random forest classifier:






    




--------------------------------------------------------






    




Number of examples          : 658






    




Number of classes           : 6






    




Number of feature columns   : 9






    




Number of unpacked features : 9






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




| Iteration | Elapsed Time | Training-accuracy | Validation-accuracy | Training-log_loss | Validation-log_loss |






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




| 1         | 0.029227     | 0.727964          | 0.424242            | 0.919855          | 1.540483            |






    




| 2         | 0.045659     | 0.808511          | 0.515152            | 0.844761          | 1.434776            |






    




| 3         | 0.060199     | 0.834347          | 0.515152            | 0.800702          | 1.419205            |






    




| 4         | 0.074599     | 0.841945          | 0.515152            | 0.798519          | 1.411447            |






    




| 5         | 0.088015     | 0.855623          | 0.606061            | 0.776646          | 1.362726            |






    




| 6         | 0.101677     | 0.866261          | 0.575758            | 0.771824          | 1.355830            |






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




Decision tree classifier:






    




--------------------------------------------------------






    




Number of examples          : 658






    




Number of classes           : 6






    




Number of feature columns   : 9






    




Number of unpacked features : 9






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




| Iteration | Elapsed Time | Training-accuracy | Validation-accuracy | Training-log_loss | Validation-log_loss |






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




| 1         | 0.025007     | 0.820669          | 0.545455            | 0.666157          | 1.287582            |






    




+-----------+--------------+-------------------+---------------------+-------------------+---------------------+






    




Logistic regression:






    




--------------------------------------------------------






    




Number of examples          : 658






    




Number of classes           : 6






    




Number of feature columns   : 9






    




Number of unpacked features : 9






    




Number of coefficients      : 50






    




Starting Newton Method






    




--------------------------------------------------------






    




+-----------+----------+--------------+-------------------+---------------------+






    




| Iteration | Passes   | Elapsed Time | Training-accuracy | Validation-accuracy |






    




+-----------+----------+--------------+-------------------+---------------------+






    




| 1         | 2        | 0.003473     | 0.489362          | 0.333333            |






    




| 2         | 3        | 0.007049     | 0.536474          | 0.424242            |






    




| 3         | 4        | 0.011057     | 0.545593          | 0.454545            |






    




| 4         | 5        | 0.014755     | 0.553191          | 0.454545            |






    




| 5         | 6        | 0.018482     | 0.551672          | 0.454545            |






    




| 6         | 7        | 0.022079     | 0.551672          | 0.454545            |






    




+-----------+----------+--------------+-------------------+---------------------+






    



PROGRESS: Model selection based on validation accuracy:
PROGRESS: ---------------------------------------------
PROGRESS: BoostedTreesClassifier          : 0.5454545021057129
PROGRESS: RandomForestClassifier          : 0.5454545021057129
PROGRESS: DecisionTreeClassifier          : 0.5454545021057129
PROGRESS: LogisticClassifier              : 0.454545
PROGRESS: ---------------------------------------------
PROGRESS: Selecting BoostedTreesClassifier based on validation set performance.






    




SUCCESS: Optimal solution found.



In [30]:

    
clf1









    Out[30]:





Class                          : BoostedTreesClassifier

Schema
------
Number of examples             : 658
Number of feature columns      : 9
Number of unpacked features    : 9
Number of classes              : 6

Settings
--------
Number of trees                : 60
Max tree depth                 : 6
Training time (sec)            : 0.2123
Training accuracy              : 0.965
Validation accuracy            : 0.5455
Training log_loss              : 0.335
Validation log_loss            : 1.4106



In [31]:

    
#clf1.evaluate(test_data)
#clf1.evaluate(train_data)



In [32]:

    
#now convert the rest of the data
sf_contact_null = tc.SFrame(data=contact_null)
new_contacts = clf1.predict(sf_contact_null[features])
contact_null['Contacts'] = new_contacts

New electrodes table with inputed values



In [33]:

    
electrodes = contact_null.append(contact_not_null).sort_index()



In [34]:

    
#missing_values_table(electrodes)
# No missing values anymore

How the categories look like on the two different groups?



In [35]:

    
contacts_supine = df_supine.groupby('Contacts').count()['TPLE']
contacts_prone = df_prone.groupby('Contacts').count()['TPLE']



In [36]:

    
plt.figure(figsize=[20, 8])
plt.subplot(1,2,1); plt.xticks(contacts_supine.index)
plt.bar(contacts_supine.index, contacts_supine)
plt.subplot(1,2,2), plt.xticks(contacts_prone.index)
plt.bar(contacts_prone.index, contacts_prone, color='y')









    Out[36]:





<BarContainer object of 6 artists>

Target Variables

Now we will remove large outliers (difference between planned and real coord) in the Z-axis as electrode insertion length (depth) is influenced also by other factors (calculations regarding depth which could lead to either too superficial or too deep, but also possible malfixation of the screw cap which may cause loosening of the electrode and hence a more superficial position.. it won't migrate into the depth spontaneously). These are very limited numbers and would too much influence further analysis.



In [37]:

    
plt.figure(figsize=[16, 8])
coordinates = ['PlanningX','PlanningY','PlanningZ','TipX','TipY','TipZ']
for c in coordinates:
    sns.kdeplot(electrodes[str(c)], label=str(c))

Every Axis follows a particular distribution, let's try a normal test



In [38]:

    
for c in coordinates:
    normal_test(electrodes[str(c)])









    



p = 6.9465819854989e-84 The null hypothesis can be rejected
p = 1.882265752281482e-07 The null hypothesis can be rejected
p = 0.0002928270371526327 The null hypothesis can be rejected
p = 1.404931961375195e-62 The null hypothesis can be rejected
p = 1.9081660369783106e-06 The null hypothesis can be rejected
p = 0.00010052998936429912 The null hypothesis can be rejected

Not normal distribuited



In [39]:

    
electrodes.head()









    Out[39]:







  
    
      
      PatientPosition
      Contacts
      ElectrodeType
      PlanningX
      PlanningY
      PlanningZ
      DuraTipDistancePlanned
      TipX
      TipY
      TipZ
      SkinSkullDistance
      SkullThickness
      SkullAngle
      TPLE
    
  
  
    
      0
      Supine
      18
      Oblique
      125.8
      106.5
      135.5
      86.6
      126.4
      106.8
      135.2
      7.0
      9.7
      70.3
      0.7
    
    
      1
      Supine
      18
      Oblique
      130.6
      131.0
      136.4
      105.9
      134.7
      132.9
      136.0
      9.4
      7.4
      63.8
      4.5
    
    
      2
      Supine
      10
      Oblique
      139.1
      104.9
      124.0
      35.2
      139.1
      108.3
      124.4
      9.8
      5.5
      66.4
      3.4
    
    
      3
      Supine
      10
      Oblique
      137.7
      112.2
      115.0
      35.9
      136.4
      115.2
      115.0
      8.4
      6.5
      66.3
      3.3
    
    
      4
      Supine
      12
      Oblique
      126.0
      76.0
      124.1
      39.5
      124.6
      75.8
      126.8
      7.4
      4.6
      84.7
      3.0

We need to structure our data properly for further analysis and convert the categorical variables (nominal, ordinal) to the category type.

Regression Model to Predict Coordinates



In [40]:

    
electrodes.drop('TPLE', axis = 1, inplace = True)



In [41]:

    
#electrodes.insert(0, 'Index', range(0, len(electrodes), 1))
sf_electrodes = tc.SFrame(data=electrodes)



In [42]:

    
sf_electrodes.head()









    Out[42]:





    
        PatientPosition
        Contacts
        ElectrodeType
        PlanningX
        PlanningY
        PlanningZ
        DuraTipDistancePlanned
        TipX
    
    
        Supine
        18
        Oblique
        125.8
        106.5
        135.5
        86.6
        126.4
    
    
        Supine
        18
        Oblique
        130.6
        131.0
        136.4
        105.9
        134.7
    
    
        Supine
        10
        Oblique
        139.1
        104.9
        124.0
        35.2
        139.1
    
    
        Supine
        10
        Oblique
        137.7
        112.2
        115.0
        35.9
        136.4
    
    
        Supine
        12
        Oblique
        126.0
        76.0
        124.1
        39.5
        124.6
    
    
        Supine
        15
        Oblique
        134.8
        121.8
        119.8
        55.3
        134.2
    
    
        Supine
        15
        Oblique
        131.8
        117.3
        118.4
        87.9
        133.5
    
    
        Supine
        18
        Oblique
        66.4
        123.9
        116.4
        74.1
        67.4
    
    
        Supine
        10
        Oblique
        146.0
        58.0
        109.9
        32.6
        150.1
    
    
        Supine
        10
        Orthogonal
        131.1
        85.4
        108.8
        49.3
        127.8
    


    
        TipY
        TipZ
        SkinSkullDistance
        SkullThickness
        SkullAngle
    
    
        106.8
        135.2
        7.0
        9.7
        70.3
    
    
        132.9
        136.0
        9.4
        7.4
        63.8
    
    
        108.3
        124.4
        9.8
        5.5
        66.4
    
    
        115.2
        115.0
        8.4
        6.5
        66.3
    
    
        75.8
        126.8
        7.4
        4.6
        84.7
    
    
        122.6
        121.3
        6.9
        11.1
        67.0
    
    
        117.9
        116.9
        8.0
        9.9
        72.8
    
    
        127.5
        114.4
        8.9
        11.5
        61.2
    
    
        59.9
        111.4
        13.0
        9.6
        50.1
    
    
        84.6
        108.8
        8.5
        6.5
        82.6
    

[10 rows x 13 columns]



In [43]:

    
sf_electrodes.num_rows()









    Out[43]:





866



In [44]:

    
features_train = ['PatientPosition',
 'Contacts',
 'ElectrodeType',
 'PlanningX',
 'PlanningY',
 'PlanningZ',
 'DuraTipDistancePlanned','SkinSkullDistance',
 'SkullThickness',
 'SkullAngle']



In [45]:

    
targets_train = ['TipX', 'TipY', 'TipZ']



In [46]:

    
train_data, test_data = sf_electrodes.random_split(0.8)



In [47]:

    
def multi_boosted_tree(dataset, features, targets):
    models = []
    for target in targets:
        reg = tc.boosted_trees_regression.create(dataset, target=target, features = features, verbose=False)
        models.append(reg)
    return models



In [48]:

    
MBT = multi_boosted_tree(train_data, features_train, targets_train)



In [49]:

    
MBT









    Out[49]:





[Class                          : BoostedTreesRegression
 
 Schema
 ------
 Number of examples             : 655
 Number of feature columns      : 10
 Number of unpacked features    : 10
 
 Settings
 --------
 Number of trees                : 10
 Max tree depth                 : 6
 Training time (sec)            : 0.0321
 Training rmse                  : 3.5981
 Validation rmse                : 4.3997
 Training max_error             : 14.8011
 Validation max_error           : 11.5872,
 Class                          : BoostedTreesRegression
 
 Schema
 ------
 Number of examples             : 660
 Number of feature columns      : 10
 Number of unpacked features    : 10
 
 Settings
 --------
 Number of trees                : 10
 Max tree depth                 : 6
 Training time (sec)            : 0.0254
 Training rmse                  : 3.6595
 Validation rmse                : 5.0908
 Training max_error             : 18.2009
 Validation max_error           : 14.4441,
 Class                          : BoostedTreesRegression
 
 Schema
 ------
 Number of examples             : 646
 Number of feature columns      : 10
 Number of unpacked features    : 10
 
 Settings
 --------
 Number of trees                : 10
 Max tree depth                 : 6
 Training time (sec)            : 0.0232
 Training rmse                  : 3.6027
 Validation rmse                : 3.9878
 Training max_error             : 18.9048
 Validation max_error           : 13.8965]



In [50]:

    
MBT[1].get_feature_importance()









    Out[50]:





    
        name
        index
        count
    
    
        PlanningY
        None
        78
    
    
        PlanningX
        None
        17
    
    
        DuraTipDistancePlanned
        None
        11
    
    
        PlanningZ
        None
        8
    
    
        SkullThickness
        None
        7
    
    
        Contacts
        None
        6
    
    
        SkullAngle
        None
        5
    
    
        SkinSkullDistance
        None
        5
    
    
        ElectrodeType
        Oblique
        2
    
    
        PatientPosition
        Supine
        1
    

[12 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [51]:

    
# Evaluate the model and save the results into a dictionary
#results = 
for i in range(3):
    print(MBT[i-1].evaluate(test_data))









    



{'max_error': 17.067489624023438, 'rmse': 4.474193676667111}
{'max_error': 16.47852325439453, 'rmse': 3.749562618801999}
{'max_error': 20.500869750976562, 'rmse': 4.282052307036403}

TPLE Comparsison with MBT



In [52]:

    
test_data['PredictX'] = [round(i,3) for i in MBT[0].predict(test_data)]
test_data['PredictY'] = [round(i,3) for i in MBT[1].predict(test_data)]
test_data['PredictZ'] = [round(i,3) for i in MBT[2].predict(test_data)]



In [53]:

    
test_data['TPLE'] = np.sqrt(np.square(test_data['TipX'] - test_data['PlanningX']) + 
                              np.square(test_data['TipY'] - test_data['PlanningY']) + 
                              np.square(test_data['TipZ'] - test_data['PlanningZ'])
                             ).round(1)

test_data['TPLE_Pred'] = np.sqrt(np.square(test_data['PredictX'] - test_data['PlanningX']) + 
                              np.square(test_data['PredictY'] - test_data['PlanningY']) + 
                              np.square(test_data['PredictZ'] - test_data['PlanningZ'])
                             ).round(1)



In [54]:

    
test_data









    Out[54]:





    
        PatientPosition
        Contacts
        ElectrodeType
        PlanningX
        PlanningY
        PlanningZ
        DuraTipDistancePlanned
        TipX
    
    
        Supine
        18
        Oblique
        125.8
        106.5
        135.5
        86.6
        126.4
    
    
        Supine
        8
        Orthogonal
        65.7
        112.6
        99.9
        40.7
        66.5
    
    
        Supine
        8
        Orthogonal
        67.4
        119.8
        113.3
        40.3
        68.3
    
    
        Supine
        15
        Orthogonal
        85.9
        57.8
        90.6
        58.8
        86.3
    
    
        Supine
        15
        Oblique
        59.5
        87.7
        64.4
        23.4
        58.7
    
    
        Supine
        12
        Oblique
        131.5
        89.4
        77.9
        45.2
        130.4
    
    
        Prone
        18
        Oblique
        73.4
        101.0
        134.2
        86.9
        71.5
    
    
        Prone
        18
        Oblique
        55.2
        104.9
        150.1
        100.1
        53.8
    
    
        Prone
        8
        Oblique
        65.8
        97.2
        111.8
        54.4
        64.2
    
    
        Supine
        8
        Oblique
        77.0
        121.6
        75.7
        28.2
        76.6
    


    
        TipY
        TipZ
        SkinSkullDistance
        SkullThickness
        SkullAngle
        PredictX
        PredictY
        PredictZ
        TPLE
        TPLE_Pred
    
    
        106.8
        135.2
        7.0
        9.7
        70.3
        124.372
        103.53
        132.015
        0.7
        4.8
    
    
        116.5
        98.7
        7.8
        6.0
        81.1
        64.963
        109.796
        98.047
        4.2
        3.4
    
    
        124.7
        114.2
        9.7
        4.8
        75.0
        66.017
        120.13
        110.395
        5.1
        3.2
    
    
        56.3
        88.5
        7.3
        5.0
        74.3
        85.268
        55.077
        86.853
        2.6
        4.7
    
    
        87.8
        63.7
        8.0
        11.2
        67.9
        58.403
        85.078
        64.483
        1.1
        2.8
    
    
        89.7
        78.7
        3.8
        4.0
        70.0
        125.649
        85.699
        76.604
        1.4
        7.0
    
    
        100.0
        135.7
        7.6
        8.8
        63.4
        69.97
        97.498
        127.949
        2.6
        7.9
    
    
        103.1
        152.4
        8.3
        8.3
        58.1
        46.963
        101.702
        138.595
        3.2
        14.5
    
    
        94.9
        114.3
        7.7
        6.5
        53.9
        63.215
        94.081
        111.96
        3.8
        4.1
    
    
        120.7
        76.4
        5.0
        7.0
        74.7
        75.512
        120.13
        75.303
        1.2
        2.1
    

[180 rows x 18 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.



In [91]:

    
#test_data['TPLE'].show()



In [92]:

    
#test_data['TPLE_Pred'].show()



In [57]:

    
rmse_1 = tc.evaluation.rmse(test_data['TPLE'], test_data['TPLE_Pred'])
print('Root Mean Squared Error for Boosted Tree Regression \n{}'.format(rmse_1))









    



Root Mean Squared Error for Boosted Tree Regression 
3.6

Point Estimator



In [58]:

    
td_supine_tple = test_data[test_data['PatientPosition']=='Supine']['TPLE']
td_prone_tple = test_data[test_data['PatientPosition']=='Prone']['TPLE']



In [59]:

    
print('TPLE Mean Supine: {} (SD {}) \nTPLE Mean Prone: {} (SD {})'.format(
    round(td_supine_tple.mean(),2), round(td_supine_tple.std(),2), 
    round(td_prone_tple.mean(),2), round(td_prone_tple.std(),2)))









    



TPLE Mean Supine: 3.32 (SD 2.0) 
TPLE Mean Prone: 4.28 (SD 3.17)



In [60]:

    
estimators = []
for i in range(len(test_data)):
    if test_data['PatientPosition'][i] == 'Supine':
        estimators.append(round(td_supine_tple.mean(),2))
    else:
        estimators.append(round(td_prone_tple.mean(),2))



In [61]:

    
test_data['TPLE_point'] = estimators



In [62]:

    
rmse_2 = tc.evaluation.rmse(test_data['TPLE'], test_data['TPLE_point'])
print('Root Mean Squared Error for Point Estimator \n{}'.format(rmse_2))









    



Root Mean Squared Error for Point Estimator 
2.337952570567295



In [63]:

    
#test_data.print_rows(172,19)

Deep learning

As a quick glance to deep learning we will apply a Multilayer Perceptron (MLP) for multi-class softmax classification using stochastic gradient descent as an optimizer. The code is borrowed from the official keras Sequential model guide and adapted where needed.



In [64]:

    
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler



In [65]:

    
#First some preprocessing
X = electrodes[features_train].values
#column 0 and 2 are categorical
y = electrodes[targets_train].values



In [66]:

    
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])



In [67]:

    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=80)



In [68]:

    
sc = StandardScaler()
X_trainS = sc.fit_transform(X_train)
X_testS = sc.transform(X_test)



In [89]:

    
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from keras.utils.np_utils import to_categorical
from keras.callbacks import ModelCheckpoint



In [70]:

    
batch_size = 2
epochs = 100



In [71]:

    
model = Sequential()
model.add(Dense(output_dim = 60, activation = 'relu', input_dim = X_train.shape[1])) #(10,)
model.add(Dense(output_dim = 60, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(output_dim = 3, activation = 'linear')) # dim output (3, ) last layer should have linear activations like no activators



In [72]:

    
model.summary()









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 60)                660       
_________________________________________________________________
dense_2 (Dense)              (None, 60)                3660      
_________________________________________________________________
dropout_1 (Dropout)          (None, 60)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 183       
=================================================================
Total params: 4,503
Trainable params: 4,503
Non-trainable params: 0
_________________________________________________________________



In [73]:

    
#sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer = 'adam', loss = 'mean_squared_error', metrics = ['accuracy']) #mse



In [74]:

    
#epoch number, 2 decimals - valaccuracy 2 float
filepath = "../models/weights-nn-{epoch:02d}-{val_acc:.2f}.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=0, mode='max')

model.fit(X_trainS, y_train,
          batch_size=batch_size, verbose=0,
          epochs=epochs,
          validation_data=(X_testS, y_test),
          callbacks=[checkpoint])









    Out[74]:





<keras.callbacks.History at 0x1a3b09ee80>



In [76]:

    
scores = model.evaluate(X_testS, y_test, batch_size=2)
print('\n\nrmse: {:.2f}\n\n{}: {:.2f}'.format(np.sqrt(scores[0]), model.metrics_names[1], scores[1]))









    



174/174 [==============================] - 0s 433us/step


rmse: 7.27

acc: 0.95



In [77]:

    
y_pred = model.predict(X_testS)

We are predicting the real carteasian points, so we want to calculate a "real TPLE" before the operation

TPLE Comparsison for NN



In [78]:

    
#feature_idx, feature_name = [], []
#features = []
#for idx, i in enumerate(electrodes.columns):
#    features.append((idx,i))
#    feature_idx.append(idx)
#    feature_name.append(i)



In [79]:

    
list_coor = ['PlanningX','PlanningY','PlanningZ','TipX','TipY','TipZ']
X = electrodes[list_coor].values #trying the TPLE calculation
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=80)



In [80]:

    
coordenates = pd.DataFrame() 
coordenates['PlanningX'] = [i for i in X_test[:,0]]
coordenates['PlanningY'] = [i for i in X_test[:,1]]
coordenates['PlanningZ'] = [i for i in X_test[:,2]]
coordenates['TipX'] = [i for i in X_test[:,3]]
coordenates['TipY'] = [i for i in X_test[:,4]]
coordenates['TipZ'] = [i for i in X_test[:,5]]
coordenates['PredictX'] = [y_pred[i][0] for i in range(len(y_pred))]
coordenates['PredictY'] = [y_pred[i][1] for i in range(len(y_pred))]
coordenates['PredictZ'] = [y_pred[i][2] for i in range(len(y_pred))]



In [81]:

    
coordenates.head()



In [82]:

    
coordenates['TPLE'] = np.sqrt(np.square(coordenates['TipX'] - coordenates['PlanningX']) + 
                              np.square(coordenates['TipY'] - coordenates['PlanningY']) + 
                              np.square(coordenates['TipZ'] - coordenates['PlanningZ'])
                             ).round(1)

coordenates['TPLE_Pred'] = np.sqrt(np.square(coordenates['PredictX'] - coordenates['PlanningX']) + 
                              np.square(coordenates['PredictY'] - coordenates['PlanningY']) + 
                              np.square(coordenates['PredictZ'] - coordenates['PlanningZ'])
                             ).round(1)



In [83]:

    
coordenates.head(10)



In [84]:

    
sf_coordenates = tc.SFrame(data=coordenates)



In [85]:

    
rmse_3 = tc.evaluation.rmse(sf_coordenates['TPLE'], sf_coordenates['TPLE_Pred'])
print('Root Mean Squared Error for Deep Learning \n{}'.format(rmse_3))









    



Root Mean Squared Error for Deep Learning 
8.749574702307786

Preliminary Results



In [86]:

    
rmseS = [rmse_1, rmse_2, rmse_3]
rsme_accuracy = [round(1 - (i/max(coordenates['TPLE'])),4) for i in rmseS]
rsme_accuracy









    Out[86]:





[0.8182, 0.8819, 0.5581]



In [88]:

    
#rmse the smaller the better
y=rsme_accuracy#[round(rmse_1,1),round(rmse_2,1),round(rmse_3,1)]; 
x=list(range(len(y)))
top = [i+0.1 for i in y]; bot = [i-0.1 for i in y]; inter = list(zip(bot,top))
plt.figure(figsize=(16,8))
plt.errorbar(x,y,yerr=[(top-bot)/2 for top,bot in inter], fmt='o', label='Accuracy', color='red', linewidth=3)
plt.plot(x,y, color='black'); plt.legend(); plt.title('Accuracy of RMSE for TPLE')
plt.text(x[0]+.05, y[0]-.01, 'Multi-outupt Boosted Trees')
plt.text(x[1]+.05, y[1]+.01, 'Group Point Estimate')
plt.text(x[2]-.2, y[2]-.01, 'Deep Learning')









    Out[88]:





Text(1.8,0.5481,'Deep Learning')



In [ ]:

	PlanningX	PlanningY	PlanningZ	TipX	TipY	TipZ	PredictX	PredictY	PredictZ
0	92.6	54.1	110.0	94.2	45.7	110.0	84.341675	56.625088	102.770157
1	94.8	116.1	70.3	95.7	115.9	68.8	88.230461	105.434898	67.722328
2	68.2	95.9	67.8	71.6	93.8	69.8	63.377934	84.028664	64.539787
3	62.7	87.8	136.9	61.9	87.2	138.7	66.507332	87.694168	125.087517
4	131.7	83.8	65.8	131.1	82.5	64.6	121.721565	78.528969	69.191719

	PlanningX	PlanningY	PlanningZ	TipX	TipY	TipZ	PredictX	PredictY	PredictZ	TPLE	TPLE_Pred
0	92.6	54.1	110.0	94.2	45.7	110.0	84.341675	56.625088	102.770157	8.6	11.3
1	94.8	116.1	70.3	95.7	115.9	68.8	88.230461	105.434898	67.722328	1.8	12.8
2	68.2	95.9	67.8	71.6	93.8	69.8	63.377934	84.028664	64.539787	4.5	13.2
3	62.7	87.8	136.9	61.9	87.2	138.7	66.507332	87.694168	125.087517	2.1	12.4
4	131.7	83.8	65.8	131.1	82.5	64.6	121.721565	78.528969	69.191719	1.9	11.8
5	61.1	97.0	105.2	62.1	96.5	106.5	55.112144	89.479988	99.073944	1.7	11.4
6	71.2	66.6	67.1	70.1	65.4	69.0	65.362091	56.500767	62.922848	2.5	12.4
7	69.5	98.1	112.4	72.1	99.6	113.9	64.195015	89.421608	100.860748	3.4	15.4
8	70.0	127.9	114.8	68.6	124.4	113.8	67.521965	116.666176	108.307312	3.9	13.2
9	95.6	40.5	84.4	85.1	43.4	82.1	92.237053	39.854797	79.850189	11.1	5.7

	Missing Values	% of Total Values
PatientPosition	0	0.000000
Contacts	56	6.466513
ElectrodeType	0	0.000000
PlanningX	0	0.000000
PlanningY	0	0.000000
PlanningZ	0	0.000000
PlanningRing	271	31.293303
PlanningArc	271	31.293303
DuraTipDistancePlanned	54	6.235566
EntryX	279	32.217090
EntryY	119	13.741339
EntryZ	131	15.127021
TipX	0	0.000000
TipY	0	0.000000
TipZ	0	0.000000
SkinSkullDistance	0	0.000000
SkullThickness	0	0.000000
SkullAngle	0	0.000000
ScrewLength	540	62.355658

	PatientPosition	Contacts	ElectrodeType	PlanningX	PlanningY	PlanningZ	DuraTipDistancePlanned	TipX	TipY	TipZ	SkinSkullDistance	SkullThickness	SkullAngle	TPLE
0	Supine	18	Oblique	125.8	106.5	135.5	86.6	126.4	106.8	135.2	7.0	9.7	70.3	0.7
1	Supine	18	Oblique	130.6	131.0	136.4	105.9	134.7	132.9	136.0	9.4	7.4	63.8	4.5
2	Supine	10	Oblique	139.1	104.9	124.0	35.2	139.1	108.3	124.4	9.8	5.5	66.4	3.4
3	Supine	10	Oblique	137.7	112.2	115.0	35.9	136.4	115.2	115.0	8.4	6.5	66.3	3.3
4	Supine	12	Oblique	126.0	76.0	124.1	39.5	124.6	75.8	126.8	7.4	4.6	84.7	3.0

name	index	count
PlanningY	None	78
PlanningX	None	17
DuraTipDistancePlanned	None	11
PlanningZ	None	8
SkullThickness	None	7
Contacts	None	6
SkullAngle	None	5
SkinSkullDistance	None	5
ElectrodeType	Oblique	2
PatientPosition	Supine	1